Keywords, k-NN and Neural Networks: a Support for Hierarchical Categorization of Texts in Brazilian Portuguese
نویسندگان
چکیده
A frequent problem in automatic categorization applications involving Portuguese language is the absence of large corpora of previously classified documents, which permit the validation of experiments carried out. Generally, the available corpora are not classified or, when they are, they contain a very reduced number of documents. The general goal of this study is to contribute to the development of applications which aim at text categorization for Brazilian Portuguese. Specifically, we point out that keywords selection associated with neural networks can improve results in the categorization of Brazilian Portuguese texts. The corpus is composed of 30 thousand texts from the Folha de São Paulo newspaper, organized in 29 sections. In the process of categorization, the k-Nearest Neighbor (k-NN) algorithm and the Multilayer Perceptron neural networks trained with the backpropagation algorithm are used. It is also part of our study to test the identification of keywords parting from the log-likelihood statistical measure and to use them as features in the categorization process. The results clearly show that the precision is better when using neural networks than when using the k-NN.
منابع مشابه
‘Minor’ Languages, ‘Broken’ Translations: On Brazilian Reworkings of an Albanian Novel
This essay approaches the challenges of global translation in the 21st century from what might still be considered a somewhat uncommon example: a direct translation of Ismail Kadaré's 1978 novel Prill e thyër (Broken April) from the original Albanian into Brazilian Portuguese in 2001. Not only does it examine and compare lexical elements in the source and target texts and the usage of translato...
متن کاملApplication of Wavelet Neural Networks for Improving of Ionospheric Tomography Reconstruction over Iran
In this paper, a new method of ionospheric tomography is developed and evaluated based on the neural networks (NN). This new method is named ITNN. In this method, wavelet neural network (WNN) with particle swarm optimization (PSO) training algorithm is used to solve some of the ionospheric tomography problems. The results of ITNN method are compared with the residual minimization training neura...
متن کاملPrediction of true critical temperature and pressure of binary hydrocarbon mixtures: A Comparison between the artificial neural networks and the support vector machine
Two main objectives have been considered in this paper: providing a good model to predict the critical temperature and pressure of binary hydrocarbon mixtures, and comparing the efficiency of the artificial neural network algorithms and the support vector regression as two commonly used soft computing methods. In order to have a fair comparison and to achieve the highest efficiency, a comprehen...
متن کاملConstruction of Neural Networks that Do Not Have Critical Points Based on Hierarchical Structure
a critical point is a point at which the derivatives of an error function are all zero. It has been shown in the literature that critical points caused by the hierarchical structure of a realvalued neural network (NN) can be local minima or saddle points, although most critical points caused by the hierarchical structure are saddle points in the case of complex-valued neural networks. Several s...
متن کاملPerformance Analysis of Naiotave Bayes Classification, Support Vector Machines and Neural Networks for Spam Categorization
Spam mail recognition is a new growing field which brings together the topic of natural language processing and machine learning as it is in essence a two class classification of natural language texts. An important feature of spam recognition is that it is a cost-sensitive classification: misclassification of a non-spam mail as spam is generally a more severe error than misclassifying a spam m...
متن کامل